Primer Candidate Assessment

Work Sample: AI in Fundamental Investing, Research Quality & Customer Communication
Charlie Henderson · May 2026
Primer

Candidate Assessment

A multi-model, cross-referenced submission covering all four assessment questions — plus a working product prototype built from reverse-engineering Primer's architecture.

Approach

Rather than answering each question in isolation, I treated this assessment as a single research project. The Zeus Capital podcast was transcribed and used to inform the Q1 thesis. Primer's four reports were cross-referenced against verified data from FMP API, Investegate, and analyst coverage. A competing report was generated using open-source models to test defensibility. The product was reverse-engineered from its frontend bundles (47 tools, 12 models, Visible Alpha integration). And a working product prototype was built embodying the multi-lens thesis from Q1.

Every factual claim is sourced. Every thesis was debated across 4 AI models before inclusion. The process itself demonstrates the method the memo advocates: no single AI output was trusted at face value.

Why multiple models matter: Primer currently offers GPT-5.x and Claude, but users select one at a time. This misses the core value of model diversity. GPT and Claude are trained on different data corpora, with different cutoff dates, different fine-tuning priorities, and different failure modes. DeepSeek is trained on substantially different data (Chinese + English web, different academic paper coverage). When two models agree, confidence is high. When they diverge, that divergence itself is the most valuable signal — it reveals where the training data is thin, the reasoning is ambiguous, or the question is genuinely hard. Running models in parallel and comparing outputs is fundamentally different from letting users choose one.

4
AI Models Debated
63
Min Podcast Transcribed
12
Security Findings
47
Primer Tools Mapped
Question 1

Written Memo: AI & Fundamental Investing

Central thesis: the traditional analyst is defunct. The new analyst is an "AI Output Analyst" trained to spot inconsistencies across models and synthesise correct output from multiple flawed AI inputs.

Podcast Transcription 4-Model Debate 3x Web Research
Question 2

Report Comparison: Pets at Home (PETS.L)

All 4 reports cross-referenced against FMP API actuals. Report D is best. All miss gross margin compression, negative tangible equity, and FCF. Open-source DeepSeek outperformed all 4 for $0.003.

FMP API Verified 4-Model Debate DeepSeek Replication
Question 3

Loom Video

90-second Primer pitch to a skeptical Head of Research. Then: why not ChatGPT, Claude, or AlphaSense? Talking points prepared from podcast + product analysis.

Insert Loom Link
Question 4

AI Tools Used

7 tools documented: Claude Code, multi-model debate, Groq Whisper, FMP API, DeepSeek, Playwright, web search. Honest assessment of where AI helped and where it didn't.

7 Tools ~$0.35 Total AI Cost
Appendix

Discussion Points & Security Observations

6 product/architecture discussion points (not a build plan — questions I'd want to explore with the team) + 12 passive security findings across 4 categories.

2 High Severity 12 Findings
Working Prototype

MultiLens Studio — Interactive Product Demo →

7 interactive views built from reverse-engineering Primer's architecture. Includes Radar (proactive multi-lens monitoring that Primer doesn't have), architecture comparison, and real PETS.L data across all lenses.

47 Tools Reverse-Engineered 12 Models Mapped 7 Views
Question 1

Written Memo: AI and Fundamental Investing

How will AI change fundamental investing? Two-page maximum covering workflow changes, adaptation, new risks, and product value.

Sources & References
Zeus Podcast: Smallwood on AI in Research
Databricks: Jefferies AI Rollout (250+ users)
CFA Institute: Outperformed by AI?
PineBridge: Alpha in the AI Age (2026)
Seeking Alpha: The Great Commoditization
AI Hallucination Statistics 2026 (1-in-277)
Harvard: AI Hallucination Framework
FinRobot: Open-Source AI Research Agent

How Will AI Change Fundamental Investing?

Written Memo — Primer Candidate Assessment
Charlie Henderson · May 2026
How This Memo Was Built
Podcast transcription, multi-model debate, and web research — a live demonstration of the analytical method this memo advocates.

This memo was produced using the same multi-model, cross-referencing approach it argues analysts should adopt. The process is documented below as both transparency and proof of method.

Podcast Audio Whisper Transcription Thesis Extraction Web Research (3x) 4-Model Debate Synthesis & Drafting

Step 1: Podcast Transcription & Thesis Extraction

Source: Zeus Capital podcast — "How AI is Transforming Equity Research Workflows" with Alistair Smallwood, Head of Applied AI at Primer (15 April 2026, 63 minutes)
Downloaded the MP3 from Transistor.fm, split into 7 chunks (each under the 25MB Whisper API limit), and transcribed all chunks via the Groq Whisper API (whisper-large-v3). Full transcript: 60,539 characters.

Key positions extracted from the podcast:

  • AI commoditizes information but not workflow or judgment — "there's no corpus of text that explains what to do when you see reverse factoring"
  • 70% of stock trading is fundamental, 30% is behavioral overlay — the behavioral edge remains human
  • Short-term alpha gets harder (quarterly earnings become "damp squibs" as everyone has same info), but long-term thinking (2yr+ horizons) becomes more valuable
  • LLMs are "hard programmed to choose the middle of the bell curve" — they output consensus by design
  • The scaling thesis: juniors covering 50 companies instead of 20, learning rate becomes exponential. "Teams of 3 become teams of 10, where the majority are non-human"
  • Capital formation could shift away from multi-manager back to single manager / longer-term thinking
  • Agent quality compounds via memory — inherits the analyst's mental model of a company over time

Step 2: Parallel Web Research

Search 1: AI in equity research workflows
Jefferies AI rollout to 250+ users, CFA Institute "Outperformed by AI" analysis, PwC 2026 predictions. Key finding: work that took days now takes minutes, but generating novel insights remains human territory.
Search 2: AI hallucination and bias in research
Harvard/MIT research on fabrication rates (1-in-277 papers by early 2026, a 6x increase since 2023), 51% of AI-adopting organisations reporting negative consequences from inaccuracy.
Search 3: Alpha compression and information commoditization
PineBridge "Alpha in the AI Age", Seeking Alpha "The Great Commoditization", CFA qualitative data thesis. Key finding: quantitative edge diminishing; next frontier is qualitative analysis and AI application quality.

Step 3: Multi-Model Debate

Format: Structured debate across 4 independent models
Claude Sonnet 4.6, GPT-5.4, DeepSeek V4 Pro, and Gemini 2.5 Pro each independently argued the thesis vs counter-thesis, then Gemini synthesised the verdict. Total cost: ~$0.11.
Debate result: All four models unanimously rejected the "cover more stocks" counter-thesis. The split was between evolutionary adaptation (GPT, DeepSeek — "augment existing analysts") and revolutionary transformation (Sonnet, Gemini — "the role itself must change"). The synthesis sided with transformation. Key contribution from the debate: Gemini's framing of a valuable AI product as a "Contradiction Engine" rather than a summarisation engine, which I adopted directly.

Where AI Helped — and Where It Didn't

AI helped with: Research breadth (processing 63 minutes of audio, finding current statistics across three domains, running four parallel model perspectives on the same thesis). The multi-model debate surfaced two blind spots I incorporated: the primacy of proprietary data over proprietary models, and the organisational immune response to role transformation.

AI did not help with: The central thesis — that the analyst role must shift from processing to validation — is a judgment call informed by experience, not a model output. The models were split on how radical the transformation should be; the decision to take the stronger position was mine. The models also uniformly underestimated the organisational difficulty of this transition, which is where human experience of institutional change matters most.

This process is itself a demonstration of the method this memo advocates: no single AI output was trusted at face value. The final document is a human synthesis of multiple flawed AI outputs, cross-referenced against primary source material.

Central Argument The traditional equity analyst — trained to build models, read filings, and synthesise commentary — is now performing tasks that AI does faster and more accurately. These skills have not become worthless; they have become table stakes. The analyst role must shift from information processing to information validation. The next-generation analyst is not someone who reads a 10-K. They are someone who can identify when the AI's reading of a 10-K is wrong, incomplete, or structurally biased.
01
What parts of the fundamental investing workflow will change most?
Modelling becomes automated. Information asymmetry shifts from "who read the filing" to "who spotted the AI's error." Short-term alpha compresses; long-term thinking wins.

Financial modelling becomes automated but requires adversarial oversight. An AI agent can build a three-statement model from SEC filings in minutes. Jefferies has rolled out AI research tools to 250+ analysts, turning days of work into minutes. But models are only as good as the assumptions embedded within them, and LLMs default to the statistical centre of their training distribution. The analyst's job is no longer to build the model — it is to stress-test the model's assumptions against what the AI cannot see: management credibility, competitive dynamics in flux, and non-linear strategic shifts that defy historical pattern-matching.

Information asymmetry shifts, not disappears. When every analyst had to manually read Note 150 in the Report and Accounts, spotting aging receivables or reverse factoring was a genuine edge — most competitors simply would not do the work. AI eliminates that asymmetry overnight. The new asymmetry is meta-cognitive: understanding what the AI knows, what it missed, and what it confidently hallucinated. Hallucination rates in AI-generated research have increased sixfold since 2023, with fabricated citations now appearing in 1-in-277 academic papers. Financial analysis is not immune to this failure mode.

Short-term alpha compresses. As AI democratises quarterly earnings analysis, the range of consensus outcomes narrows. Quarterly trading becomes a damp squib. The durable edge moves to two-year-plus horizons — regime change, management evolution, product pivots — where Bayesian reasoning about non-linear outcomes remains beyond current AI capability.

02
What parts will not change?
The behavioural overlay (~30% of how stocks trade) remains irreducibly human. Conviction and risk management cannot be outsourced to a model.

The behavioural overlay remains irreducibly human. Roughly 30% of how a stock trades is driven by behavioural dynamics: what is priced in, what the market believes it knows, and where positioning creates fragility. Determining "what is known and what is not known" — the core skill of active management — becomes harder, not easier, when AI is involved. Previously, you were second-guessing other humans; now you must second-guess AI agents whose reasoning is opaque and whose outputs are correlated across firms.

Conviction and risk management cannot be outsourced. An AI can generate a thesis. It cannot feel the weight of capital at risk. Portfolio construction, position sizing under uncertainty, and the discipline to hold (or cut) through volatility remain fundamentally human functions. No amount of AI sophistication changes the fact that investment is a decision made under irreducible uncertainty.

The value of management interaction shifts but persists. Management meetings are no longer about listening and taking notes — AI does that better. The new value is in asking the one question the AI cannot formulate: "Our models show a divergence between your stated CapEx priorities and your recent engineering hires in a non-core division. Can you explain this strategic ambiguity?" The analyst uses AI's complete data synthesis to identify the unknown unknowns, then uses scarce human interaction to probe those specific gaps.

03
How should analysts, PMs, and research teams adapt?
The next-generation analyst is an "AI Output Analyst" — trained in the epistemology of machine-generated knowledge, not the mechanics of data extraction.

The core thesis: analysts must become more developer than financial modeller. The traditional analyst skillset — Excel modelling, note-reading, management meeting attendance — is now table stakes that AI performs faster and more accurately. The scarce skill is no longer the ability to build a model; it is the ability to build the system that builds, verifies, and challenges the model. This is a fundamentally different competency, closer to software engineering than to traditional finance.

This requires four new competencies:

1. Agentic Workflow Design — the Analyst as Systems Architect
The ability to construct multi-step AI pipelines that cross-reference information, not just summarise it. For example: Agent 1 extracts supplier risk factors from peer 10-Ks; Agent 2 compares these against the target company's own disclosures; Agent 3 highlights discrepancies and formulates questions for management. This is systems design, not spreadsheet work. The analyst who can write a prompt chain that replicates a week of manual research in 10 minutes has an order-of-magnitude advantage over one who can build a slightly better Excel model.
2. Multi-Model Orchestration — Divergence as Signal
Running the same question through multiple model architectures (GPT, Claude, DeepSeek) and systematically identifying where they diverge. Each model is trained on different data corpora with different cutoff dates, biases, and failure modes. When they agree, confidence is high. When they diverge, that divergence is the most valuable signal in the analysis — it reveals where the training data is thin, the reasoning is ambiguous, or the question is genuinely hard. This is closer to forensic auditing than traditional research, and it requires understanding how models work, not just what they output.
3. Data Pipeline Construction — API Integration Over Manual Research
The next-generation analyst pulls balance sheet data from FMP, consensus from MarketScreener, filings from EDGAR, and news from Serper — not by visiting websites, but by building automated pipelines that feed verified data into their analysis. The analyst who can write a script to pull 3-year gross margins for a peer group in 30 seconds replaces the one who spends 3 hours copying numbers from annual reports. This is not optional technical literacy; it is the primary skill.
4. Bias and Structure Recognition
All information is structured with biases — both intentional (management framing, selective disclosure) and unintentional (training data gaps, model guardrails). The analyst must understand these biases at both the company reporting level and the AI model level, and synthesise a correct output from multiple flawed inputs. Understanding why GPT and Claude produce different answers to the same question is as important as understanding why a company's adjusted EBITDA differs from statutory.

The hiring implication: The next analyst hire should be evaluated on their ability to construct an agentic workflow, not on their ability to build a DCF in Excel. PMs should reduce coverage breadth and increase conviction depth. The firms that win will not be those covering 500 stocks with AI assistance; they will be those covering 50 stocks with AI-validated, multi-model, adversarially-tested theses — built by analysts who are as comfortable with an API as they are with an annual report.

04
What new risks does AI introduce?
Model monoculture, automation complacency, adversarial data poisoning, and hallucination propagation create an entirely new class of systemic risk.
Model Monoculture If the top 20 hedge funds build on the same 2–3 foundation models, a single flaw or data poisoning event creates a correlated, systemic market risk with no historical precedent.
Automation Complacency The "human-in-the-loop" degrades to the "human-clicking-OK." Junior analysts trained on AI outputs from day one may never develop the pattern recognition that comes from manually working through accounts.
Adversarial Manipulation When AI agents automatically process press releases and filings, deliberately poisoning these data streams becomes a viable form of market manipulation. Subtle linguistic manipulation designed to exploit known LLM biases is an emerging attack vector.
Hallucination Propagation AI-generated "facts" enter the information ecosystem and get treated as verified data points by downstream systems. A fabricated data point in one model's output can propagate through agentic chains with no common audit trail.
05
What would make an AI research product genuinely valuable rather than just a faster summarisation tool?
A Contradiction Engine, not a summarisation engine. The scarce capability is not "what did the CEO say?" but "where is the narrative inconsistent with the data?"

The market is flooded with tools that summarise earnings calls and extract financial data. These are commodities. A genuinely valuable product would:

Cross-reference management statements across quarters to identify shifted narratives and broken commitments — the linguistic equivalent of forensic accounting.
Construct adversarial "red team" analyses against a user's own thesis, systematically identifying the strongest counter-arguments and most likely failure modes.
Surface discrepancies between a company's disclosures and its peers' risk factors — if three of your four competitors flag supply chain risk and you don't, that silence is informative.
Maintain an auditable chain of reasoning that distinguishes sourced facts from imputed assumptions from model-generated estimates, so every number carries a confidence provenance.

The product that simply makes analysts faster is a commodity waiting to be competed away. The product that makes analysts more rigorous — that systematically reduces the probability of being confidently wrong — is the one that earns its seat in a professional workflow.

Question 2

AI-Generated Research Report Comparison

Pets at Home Group plc (PETS.L) — FY26 Pre-Close. Four reports cross-referenced against verified financial data from FMP API, Investegate, and analyst coverage.

Reports Reviewed
Report A — Production baseline (Score: 0/5)
Report B — Testing/prod (Score: 0/5)
Report C — Testing/high (Score: +1/5)
Report D — Testing/xhigh (Score: +1/5)
Verification Sources
Investegate: FY26 Prelim Results
FMP API: Income Statement
FMP API: Balance Sheet
FMP API: Cash Flow
MarketScreener: Analyst Consensus
TipRanks: Price Targets
Pets at Home: IR Page
Zeus Podcast: Smallwood Interview
How This Comparison Was Built
Cross-referenced against verified financial data from 4 independent sources, then debated across 4 AI models.

Data Sources Used

Financial Modeling Prep (FMP) API — Income statement, balance sheet, cash flow, ratios, and growth metrics for FY24-FY26. Provided verified revenue (£1,469.6m), statutory PBT (£86.5m), gross margin (45.7%), net debt (£357m inc. leases), FCF (£147m), and the critical goodwill figure (£960m).
Investegate / Company RNS — FY26 Preliminary Results announcement (27 May 2026). Underlying PBT £92.8m, DPS 7.4p (-43.1%), underlying EPS 14.8p, consumer revenue by segment.
Analyst coverage data — MarketScreener, TipRanks, web research. 11 analysts covering: Jefferies (Buy, 265p), Canaccord (Buy, 245p), Berenberg (Hold), Peel Hunt (Hold). Consensus avg target: 222p vs current 192p.
Multi-model debate — Claude Sonnet, GPT-5.4, DeepSeek V4 Pro, Gemini 2.5 Pro independently evaluated all four reports against verified data. Unanimous verdict on best/worst. Cost: ~$0.11.
DeepSeek V4 (open-source replication) — Ran the same analysis through DeepSeek V4 (Apache 2.0, open-source) via API, feeding it the multi-source verified data. The output was compared head-to-head against all four Primer reports to test defensibility. Cost: $0.003. Time: 8 seconds.

Verification Method

Each claim in each report was cross-referenced against at least two independent data sources. Discrepancies were flagged with the magnitude of error and the likely cause. A competing report was generated using open-source tools to test whether the output quality is a function of the model or the data pipeline. The goal was not just to identify which report is "best" but to demonstrate the type of verification work that distinguishes genuine analysis from polished AI output.

Summary Verdict Report D is the strongest — it is the only report that demonstrates genuine analytical thinking beyond restating company disclosures. Reports A and B are effectively identical and provide zero incremental value over a raw data feed. However, all four reports share critical blind spots: none address gross margin compression, balance sheet quality, cash flow generation, or valuation context. The difference between the reports is the difference between "bad" and "less bad" — none would satisfy a competent buy-side analyst without significant supplementary work.
Best Report
Report D — most analytical depth, identifies disclosure gaps
Weakest Report
Report B — identical to A with zero added value
Most Useful to Analyst
Report D — but only with manual supplementation
Best Investment Insight
Report C — "stabilisation, not recovery proof"
01
Which report is best, and why? Which report is weakest, and why?
Report D is best (original analysis, identifies gaps). Report B is worst (carbon copy of A with no differentiation). The gap between C and D is narrow; the gap between A/B and C/D is a chasm.
1st — Report D
Most Comprehensive

Profit mix table with FY25 comparison. Quantified central + insurance drag (£21m deterioration). Includes original guidance context (£115-125m → £92m). Observation that the company issued a "deliberately narrow statement" shows critical thinking about disclosure strategy. Best at identifying what data is missing and why that matters.

2nd — Report C
Best Editorial Judgment

Adds FY27-28 estimates (£100m/£115m) and bear/base/bull scenarios. Margin sensitivity analysis (50bp = £6-7m PBT) is useful original work. "Stabilisation rather than recovery proof" is the single best editorial line across all four reports. DPS estimate of -25-30% was materially wrong (actual: -43.1%).

3rd — Report A
Data Regurgitation

Restates company disclosures without original analysis. Notes the retail consensus miss (£51.1m vs £30m actual) but attributes it to "definition mismatches" without investigation. No forward estimates, no scenarios, no balance sheet analysis, no valuation context. A raw data feed with formatting.

4th — Report B
Zero Added Value

Nearly identical to Report A in structure, data, and conclusions. Same "definition mismatch" hand-wave. If two AI-generated reports are indistinguishable, one of them should not exist. Report B adds nothing that Report A does not already provide.

02
What factual, numerical, or reasoning issues do you see?
Cross-referenced every claim against FMP API data, company RNS, and analyst coverage. Key errors: all reports underestimate the DPS cut, none address gross margin compression (120bp), and the "definition mismatch" excuse avoids a critical question.
MetricReport ClaimsVerified ActualVerdict
Underlying PBTAll: "c£92m"£92.8mAccurate (within rounding)
Vet Group PBTAll: "c£83m"~£83mAccurate
FY25 Retail PBTAll: "£72.9m"£72.9mAccurate
Net DebtAll: "c£20m"£357m (inc. leases)
~£20m (ex-leases)
Correct per company definition, but no report flags the £357m total debt or the definitional difference
DPS ChangeC: "-25-30%"
D: "-25-35%"
7.4p (-43.1%)Arithmetic failure. All reports cite the 50% payout rebase (announced at pre-close) but then estimate the DPS cut as a range instead of calculating it. 50% × ~14.5p estimated EPS = ~7.25p, from 13p = -44%. The answer was derivable from the company's own stated policy. This is not an estimation error — it is a failure to perform the calculation.
RevenueAll: "Not disclosed"£1,469.6m (-0.8%)Correct that pre-close omitted revenue, but no report flagged this as an analytical risk
Gross MarginNone mention45.7% (from 46.9%)120bp compression entirely missed. £17.6m gross profit impact not discussed.
FCFNone mention£147mStrong cash generation ignored. FCF yield of 17.8% is decision-relevant.
Goodwill / Tangible EquityNone mention£960m / -£8.6mTotal intangible assets (goodwill £960m + other £22m = £982m) exceed total equity (£973m). Tangible book value is negative. Major balance sheet risk entirely unaddressed.
ValuationNone mentionEV/EBITDA 6.05xNo valuation context whatsoever. Current price 192p vs consensus target 222p (15% upside).
Retail consensus missA/B: "£51.1m vs £30m — definition mismatch"£30m actualThe £21m gap deserves investigation, not a hand-wave. See analysis below.
The "Definition Mismatch" Problem
Reports A and B note that retail consensus was £51.1m versus actual £30m — a 41% miss — but dismiss this as a "definition mismatch (segment PBT vs operating income)." This is the analytical equivalent of shrugging. A competent report would: (1) attempt to reconcile the definitions, (2) flag the magnitude as material regardless of definition, (3) investigate whether the miss reflects genuine operational deterioration that consensus hadn't priced in, and (4) frame it as a key question for management. Instead, the reports treat a £21m miss as a data quality footnote rather than a fundamental analytical finding.
The Negative Tangible Equity Risk
The most significant omission across all reports. Pets at Home carries £960m of goodwill against £973m of total equity, leaving tangible equity at negative £8.6m. This means the company's entire book value rests on the assumption that its acquired businesses (primarily veterinary practices) are worth what was paid for them. With underlying PBT down 30% and retail profits down 59%, the question of whether a goodwill impairment review is warranted is not academic — it is the single most important balance sheet question an analyst should be asking. No report raises it.
03
Which report would be most useful to an equity analyst?
Report D, but only as a starting point. It would save ~20 minutes of initial data gathering but would require 2+ hours of supplementary work on balance sheet, cash flow, valuation, and margin analysis before a management meeting.

For a buy-side analyst preparing for a management meeting, Report D provides the best foundation because it identifies the right questions: why was disclosure so narrow? What is the insurance drag trajectory? Why does the segment profit bridge not add up cleanly? These are productive starting points for management engagement.

Report C's editorial judgment is sharper. The phrase "stabilisation rather than recovery proof" is the kind of conclusion an analyst needs — it frames the investment debate correctly and tells you what to watch for in the prelims. Report C also provides the only margin sensitivity analysis (50bp retail margin = £6-7m group PBT impact), which is directly useful for scenario modelling.

What an analyst would still need to do manually:

Balance sheet review — Pull the balance sheet from Companies House or FMP. Discover the £960m goodwill, negative tangible equity, and £397m total debt. None of the reports do this.
Cash flow analysis — FCF of £147m vs £92.8m underlying PBT implies strong cash conversion (OpCF/PBT = 2.05x). This is a positive signal that partially offsets the earnings decline. Not mentioned in any report.
Valuation context — At 192p, EV/EBITDA of 6.05x with 17.8% FCF yield. Compare to Frasers Group (5.2x), CVS Group (12.4x), or B&M European (7.1x). No report provides peer comparison.
Analyst consensus cross-check — 11 analysts cover PETS.L. Split: 5 Buy, 3 Hold, 3 Sell. Average target 222p (15% upside). Jefferies at 265p (Buy), Peel Hunt at Hold. This consensus data is essential context for any investment discussion.
04
What important context is missing?
Eight critical items that no report addresses. The missing balance sheet analysis alone would change the risk assessment of this stock.
#Missing ItemWhy It MattersVerified Data
1Gross margin compression120bp decline signals pricing/cost pressure that directly impacts the recovery thesis45.7% (from 46.9%)
2Free cash flow£147m FCF against £92.8m PBT shows strong cash conversion — a bullish signal hidden by the earnings decline£147m (17.8% yield)
3Goodwill / tangible equity£960m goodwill = 98.7% of equity. Tangible equity is NEGATIVE. Impairment risk is material with declining profits-£8.6m tangible equity
4Total debt (inc. leases)Reports use company's ~£20m ex-lease figure without flagging the £397m total. Net debt/EBITDA of 1.83x is moderate but worth discussing£397m total debt
5Valuation multiplesNo EV/EBITDA, P/E, or FCF yield. Without valuation, a research report is just a news summary6.05x EV/EBITDA
6Analyst consensus & coverage11 analysts, split verdict (5 Buy / 3 Hold / 3 Sell), avg target 222p. Essential for positioning a view222p avg (15% upside)
7Share count reduction463.5m → 454.4m shares via buybacks. Affects EPS calculation and per-share metrics-2.0% dilution offset
8Historical earnings trajectoryFY24 PBT £105.7m → FY25 £120.6m → FY26 £86.5m (statutory). The FY25 peak and FY26 collapse tells a story none of the reports contextualise3-year statutory PBT trend
05
Which claims feel unsupported, generic, or overconfident?
The reports exhibit a pattern of accepting management framing uncritically, rationalising misses rather than investigating them, and making forward estimates without sufficient basis.
All reports: "Net debt c£20m"
Uncritically adopts the company's ex-lease definition without mentioning the £397m total debt, £357m net debt (inc. leases), or the 1.83x net debt/EBITDA ratio. This is not factually wrong, but presenting only the favourable definition without the alternative is a failure of analytical balance.
Reports A/B: "Definition mismatch" on retail consensus
A £21m miss (41% below consensus) is explained away as a definitional issue. This is the most dangerous type of AI output: a plausible-sounding rationalisation that prevents the analyst from asking the right question. The right response is to flag the magnitude, investigate the cause, and prepare management questions — not to explain it away.
Report C: DPS impact of "-25-30%"
Actual DPS cut was -43.1% (7.4p from 13.0p). The estimate was materially wrong. More importantly, the reasoning was not shown: a good report would derive DPS from a stated payout ratio (50%) against forecast EPS, not estimate it as a percentage range. The actual 7.4p DPS against 14.8p underlying EPS implies a 50% payout — exactly what management signalled. The AI should have calculated this.
All reports: CMA as "benign" / "no adverse impact"
All four reports accept the company's characterisation of the CMA outcome without independent analysis. The CMA's veterinary market investigation is a material regulatory event, and simply restating management's framing of it as positive is not analysis — it is PR.
Report C/D: FY28 PBT estimates (£110-115m)
Two-year-forward estimates made without revenue data, margin trends, or competitive context. These numbers are presented with implied precision that the available data does not support. The range should be explicitly wider, or the estimate should be presented with clearly stated assumptions and confidence intervals.
06
Which report best identifies what actually matters from an investment perspective?
Report C frames the investment debate most clearly. Report D provides the best analytical foundation. Neither addresses the three things that would actually drive a buy/sell decision: valuation, balance sheet quality, and cash flow sustainability.

What actually matters for an investment decision on PETS.L:

1. Is the retail recovery real or cosmetic? Report C's framing — "stabilisation rather than recovery proof" — is exactly right. Full-year retail PBT at £30m is 59% below FY25. H2 improvement is encouraging but H2 retail PBT of ~£26.5m is still roughly half of FY25 H2. The prelims need to show revenue trajectory, LFL breakdown, and margin progression. None of the reports push hard enough on this.
2. Is the valuation pricing in the downside? At 192p, EV/EBITDA of 6.05x and FCF yield of 17.8% suggest the market has priced in significant pessimism. If FY27 PBT reaches £99m consensus and the share count continues to shrink via buybacks, EPS accretion could be meaningful. None of the reports provide this context.
3. Is the balance sheet sound? £960m of goodwill on a company generating £92.8m of underlying PBT raises the question of whether the acquired vet practices are delivering adequate returns on the capital deployed to acquire them. With tangible equity negative, any impairment would directly reduce book value and potentially trigger covenant concerns. This is the question no report asks.
4. What is the capital allocation framework? Management rebased the dividend to 50% payout, launched a £50m buyback, and has £147m of FCF. This suggests confidence in cash generation despite the profit decline. The reports note the dividend rebase but don't connect it to the FCF story or the buyback implications for per-share value creation.

Bottom line: Report C best identifies the qualitative investment debate. Report D best identifies the analytical gaps. Neither comes close to what a competent human analyst would produce, because both are fundamentally constrained by what the company chose to disclose, rather than investigating what it chose not to.

07
Multi-Model Debate: Cross-Validated Verdict
Four AI models independently evaluated the reports against verified data. Unanimous on Report D as best, A/B as worst. Key insight from Gemini: the reports demonstrate a "failure of cross-statement analysis" — none connect P&L, balance sheet, and cash flow.
ModelBestWorstKey Critique
GPT-5.4DA"Report D provides comprehensive context; A fails to offer substantive analysis beyond restating data"
Claude SonnetDA"Negative tangible equity is the smoking gun none addressed — PETS is trading on £960m of acquisition goodwill with declining profitability"
DeepSeek V4DB"B is plagiarised trash — pure copy of A with zero added value. Goodwill is 99% of equity — technically insolvent if impaired."
Gemini 2.5 ProDA/B"Even the best report operates at a superficial level. The AI failed a simple capital allocation derivation: given PBT, FCF, net debt, and a buyback, management would prioritise buyback over dividend."
Gemini's architectural critique
"None of the reports connected the three financial statements. The massive £960m in Goodwill directly leads to the negative tangible equity and presents a major impairment risk. A competent system must link P&L, balance sheet, and cash flow — this is a failure of cross-statement analysis, not just a missing data point."

The debate revealed that the gap between the reports is less interesting than their shared failures. The AI models generating these reports appear unable to perform the foundational task of equity analysis: connecting financial statements to each other and deriving implications that are not explicitly stated in the source text.

08
Our Report: Multi-Source Analysis vs Single-Document Extraction
We generated a competing report using DeepSeek V4 (open-source) fed with verified data from 4 sources. Below is the full report with source annotations, followed by a head-to-head comparison against Reports A-D.

Model: DeepSeek V4 (Apache 2.0, open-source)  |  Cost: $0.003  |  Time: 8 seconds  |  Data sources: FMP API, Investegate RNS, MarketScreener, TipRanks

Pets at Home Group plc (PETS.L) — Earnings Briefing

FY26 (52 weeks to 26 March 2026)  |  Generated from multi-source verified data

Headline: A Year of Two Halves, but Structural Questions Remain. Pets at Home delivered a weak FY26, with underlying PBT of £92.8m falling 30% YoY and well below the original guidance range of £115-125m. The headline narrative of "H2 recovery" is supported by improving retail volumes, but the damage was done in H1, and the full-year numbers reveal material margin erosion. The dividend rebase (-43%) and £50m buyback signal management's intent to return capital, but the balance sheet carries a significant goodwill overhang that cannot be ignored.
[Sources: Investegate RNS, FMP Income Statement]

Retail (PBT: ~£30m, -59% YoY). The primary drag. Gross margin compression of 120bps (to 45.7%) reflects promotional intensity, mix shift, and cost inflation not passed through. H1 retail PBT of just £3.5m was catastrophic; H2 implied ~£26.5m shows recovery, but full-year profitability remains well below FY25's £72.9m. The division now generates a PBT margin of roughly 2%.
[Sources: FMP Ratios (gross margin), Reports A-D (segment split)]

Vet Group (PBT: ~£83m, +9% YoY). The standout performer. Consumer revenue growth of 5% and PBT expansion of 9% demonstrate pricing power. The Vet Group now accounts for nearly 90% of group underlying PBT. This concentration risk is underappreciated: if vet growth slows, the group's earnings base is vulnerable.
[Source: Investegate RNS (segment revenue)]

Gross Margin: The 120bp Wound. The most concerning data point. Gross margin declined from 46.9% to 45.7% — a 120bp compression representing ~£17.6m of gross profit impact. This is structural, not one-off. If retail margins do not recover in FY27, operating leverage is severely impaired.
[Source: FMP Income Statement (FY25 GP £694.7m/46.9% vs FY26 GP £672.0m/45.7%)]

Balance Sheet: The Elephant in the Room. Tangible equity is negative £8.6m. The group's entire equity base of £973m is supported by £960m of goodwill from historical acquisitions (primarily veterinary practices). Any impairment — even a modest 10% write-down of £96m — could trigger covenant issues. Net debt/EBITDA of 1.83x is manageable, but the goodwill-to-equity ratio of 98.7% is extreme.
[Source: FMP Balance Sheet (Goodwill £960m, Equity £973m, Total Debt £397m)]

Cash Flow: The Silver Lining. Operating cash flow of £190m and FCF of £147m (17.8% yield) are strong. OpCF/PBT ratio of 2.05x demonstrates robust cash conversion despite declining earnings. However, FCF strength is partly a function of low CapEx (£43m, 2.9% of revenue) — not underlying earnings growth.
[Source: FMP Cash Flow (OpCF £190m, CapEx £43m, FCF £147m)]

Valuation: Cheap for a Reason. At 192p: EV/EBITDA 6.05x, P/E ~13x (underlying), FCF yield 17.8%. The market has priced in further erosion. Analyst consensus: 11 covering, 5 Buy / 3 Hold / 3 Sell. Average target 222p (15% upside). Jefferies at 265p (Buy), Peel Hunt at Hold.
[Sources: FMP Key Metrics, MarketScreener, TipRanks]

FY27 Scenarios (Group Underlying PBT):
Bear (£80m): Retail margins fail to recover, Vet slows to 3%. Stock to 150p.
Base (£98m): Consensus. Retail stabilises, Vet grows 5%. Stock to 200p.
Bull (£110m): Retail margins recover 50bps, Vet accelerates. Stock to 250p.

Conclusion: Cautious, Not Conviction. Cash-generative with a strong vet franchise, but retail is structurally challenged, the balance sheet carries significant goodwill risk, and margin compression is not yet arrested. Neutral — pending greater visibility on retail margins and goodwill impairment risk.

Head-to-Head: Our Report vs Primer Reports

DimensionPrimer Reports (A-D)Our ReportSource
Gross margin Not mentioned in any report 120bp compression identified as "most concerning data point" FMP
Goodwill / tangible equity Not mentioned in any report £960m goodwill, -£8.6m tangible equity flagged as impairment risk FMP
Cash flow analysis Not mentioned in any report FCF £147m (17.8% yield), OpCF/PBT 2.05x identified as positive signal FMP
Valuation context No multiples, no peer comparison, no analyst targets EV/EBITDA 6.05x, P/E ~13x, FCF yield 17.8%, 11 analysts, avg target 222p MarketScreener
Investment conclusion No position taken in any report "Neutral — cheap can get cheaper." Clear stance with reasoning Editorial judgment
Vet concentration risk "Earnings anchor" (positive framing only) "~90% of group PBT. Concentration risk is underappreciated" Derived from segment data
DPS forecast C: -25-30%, D: -25-35% (actual: -43.1%) Correctly states -43% actual cut, links to 50% payout policy Investegate
Underlying PBT All: "c£92m" (accurate) £92.8m (precise) Both accurate
H2 recovery narrative C: "stabilisation not recovery" (good) "H2 recovery supported but damage done in H1" (similar) Both adequate
Why our report is better — and why it's not about the model
The improvement comes entirely from the data pipeline, not model quality. Primer's reports appear to analyse only the pre-close statement text. Our report was fed verified data from FMP (balance sheet, cash flow, ratios), Investegate (prelim results), and analyst consensus sources. The same DeepSeek model given only the pre-close text would produce output comparable to Reports A-D. The lesson: multi-source data ingestion is a more defensible moat than single-document extraction accuracy.
09
Replicability: What Open-Source Can Do Today — and Where Primer's Real Moat Should Be
Report generation is replicable with open-source models for ~$0.003. The improvement comes from multi-source data pipelines, not model quality. Primer's genuine moat is workflow encoding and agent memory — but these reports don't demonstrate it.

The defensibility question. To test whether Primer's report output is replicable, I ran the same analysis through DeepSeek V4 (open-source, Apache 2.0 licence) via API. The prompt included verified data from the FMP financial API, the company's RNS announcement, and analyst coverage data — the same multi-source approach an analyst would take. Cost: $0.003. Time: 8 seconds.

The result: the open-source report outperformed all four Primer reports on every dimension that matters to an analyst.

DimensionPrimer Reports (A-D)DeepSeek V4 (Open Source)
Gross margin analysisNone mention the 120bp compressionIdentifies it as "the most concerning data point" and links to structural pressures
Balance sheet / goodwillNone address £960m goodwill or negative tangible equityCalls it "the elephant in the room" — flags impairment risk and covenant exposure
Cash flowNone mention £147m FCFAnalyses FCF yield (17.8%), notes it's driven by low CapEx not earnings growth
Valuation contextNo EV/EBITDA, P/E, or peer comparisonProvides EV/EBITDA (6.05x), P/E (~13x), FCF yield, and analyst target context
DPS forecastC: -25-30%, D: -25-35% (actual: -43.1%)States actual cut correctly (-43%) and links to 50% payout policy
Data extraction accuracyCore numbers accurate (c£92m vs £92.8m actual)Uses verified actuals directly from multiple sources
Investment conclusionNo position takenTakes a clear stance: "Neutral — cheap can get cheaper"
Vet concentration riskNotes vet is "earnings anchor" but doesn't flag risk"Vet now accounts for ~90% of group PBT. This concentration risk is underappreciated."

Why the Open-Source Report Is Better

The improvement does not come from a better model. It comes from a better data pipeline. The Primer reports appear to analyse only the pre-close statement in isolation. The open-source report was fed data from four sources:

FMP API — 3-year income statement, balance sheet (revealing the £960m goodwill), cash flow (revealing the £147m FCF), ratios (revealing the 120bp margin compression), and growth metrics.
Company RNS (Investegate) — Full preliminary results with actual DPS (7.4p), revenue by segment, and underlying EPS.
Analyst coverage (MarketScreener, TipRanks) — 11-analyst consensus, individual broker ratings and targets, sentiment split.
Financial growth metrics — 3-year revenue, gross profit, and net income growth trends for contextualising the cycle.

This is the critical insight: 100% retrieval accuracy from a single source document is a solved problem. Smallwood himself acknowledged this in the Zeus podcast: "pulling numbers correctly doesn't make them a great analyst." The Primer reports prove this — they extract accurately from the pre-close statement, but they do not cross-reference against the balance sheet, cash flow statement, or external data sources. The result is reports that are precisely accurate about what the company chose to disclose, and entirely silent about what it didn't.

Open-Source Tools That Could Replicate This

ToolWhat It DoesLicence
FinRobot8 specialised agents, multi-page equity research with DCF, 15+ chart types, 3-year projectionsMIT (Open Source)
DeepSeek V4 Pro1.6T parameter model, strong financial reasoning, long-context (128K), agentic workflow capableApache 2.0
LlamaIndex + LlamaExtractStructured data extraction from SEC/RNS filings with citation tracking and source traceabilityMIT
Llama 4 Scout10M token context window — can ingest entire annual reports, 5 years of filings simultaneouslyMeta Community
FMP / Polygon / FRED APIsReal-time financial data, historical statements, macro overlays — provide the multi-source data layer the reports lackCommercial (low cost)

Where Primer's Genuine Moat Should Be

The podcast makes a compelling case for three capabilities that open-source tools cannot easily replicate:

1. Workflow Encoding — the "2,000 Modules" Problem
Smallwood describes 2,000 modular analytical tasks an analyst knows how to perform, from forensic accounting to peer comparison. The value is not in executing any single module but in knowing which module comes next. This agentic sequencing — deciding that after spotting aging receivables you should check the supplier 10-Ks — is genuinely hard to replicate with generic open-source tools. The reports, however, do not demonstrate this capability. They follow a linear template, not an adaptive workflow.
2. Agent Memory — Compounding Context Over Time
Primer's agent remembers every interaction, building a mental model of each covered company. Smallwood describes this as the "compounding effect" — the agent inherits the analyst's view and improves suggestions over time. This is a genuine switching cost and a defensible moat. But it is a platform moat, not a report quality moat. These static reports do not benefit from it.
3. Programmable Analyst Rules
Users can "pin" instructions to the agent — e.g., "always model retail on a pre-IFRS 16 basis." This customisation creates a personalised analytical engine that improves with use. Again, the static reports don't show this; they are one-size-fits-all outputs.

The constructive conclusion: Primer's static report output is replicable and, when compared against a multi-source data pipeline, is outperformed by open-source alternatives at negligible cost. The genuine product differentiation — workflow encoding, agent memory, and programmable rules — is compelling but is not visible in these reports. The product roadmap should prioritise making these interactive, compounding capabilities the primary value proposition, rather than competing on static report generation where the moat is thin.

Question 3

Loom Video

Five-minute video explaining Primer to a skeptical Head of Research, then addressing: why not ChatGPT, Claude, or AlphaSense?

🎥 Loom Video

Watch the video →

Replace the link above with your Loom URL after recording.

Talking Points & Structure
Prepared structure for the 5-minute Loom recording.

First 90 Seconds: The Pitch to a Skeptical Head of Research

Open with the problem, not the product.
"Your analysts spend 60% of their time on data extraction and model building — work that AI now does in minutes. The remaining 40% — judgment, conviction, behavioural overlay — is where your alpha comes from. But most AI tools optimise for the 60% and ignore the 40%. Primer is different: it's built by analysts who've sat in your seat, and it's designed to make the judgment work better, not just the grunt work faster."
The differentiator in one sentence.
"Primer doesn't just extract data — it remembers how you think about each company, learns your analytical preferences, and compounds that context over time. It's a co-pilot that gets smarter the more you use it."

Why Not Just Use ChatGPT?

No memory, no workflow, no auditability.
ChatGPT is a general-purpose tool. It doesn't remember your last session, can't enforce your analytical rules (e.g. "always model pre-IFRS 16"), and has no audit trail for where numbers came from. In a regulated environment with capital at stake, "I asked ChatGPT" is not a defensible process. Primer gives you a walled-garden agent that inherits your methodology.

Why Not Just Use Claude or Claude Code?

Powerful reasoning, but no financial domain architecture.
Claude is excellent at analysis — I used it to build this assessment. But it doesn't have structured financial data ingestion, can't pull live filings, doesn't maintain a coverage universe, and starts from zero every session. Primer has 100% retrieval accuracy from source documents and persistent analyst-specific context. Claude is a brilliant brain with no filing cabinet.

Why Not Just Use AlphaSense?

Search vs. synthesis — and the data cross-referencing gap.
AlphaSense is exceptional at finding information across documents. But the real analytical value isn't in finding — it's in cross-referencing. Can AlphaSense automatically pull the balance sheet from FMP, compare gross margin trends over 3 years, check whether management's net debt definition excludes lease liabilities, and flag that tangible equity is negative? That requires structured data pipelines feeding verified financial data into the analysis, not just document search. Primer does the thinking after the finding. The opportunity is to combine Primer's workflow intelligence with multi-source data verification — that's the product nobody else has built.

The Bigger Point: Multi-Model Verification

None of these tools — ChatGPT, Claude, or AlphaSense — verify their own output.
Every AI tool gives you a single model's answer. But GPT-5.4 and Claude Opus are trained on different data, with different biases and different failure modes. When you run the same question through both and they agree, your confidence is high. When they diverge, that divergence is the most valuable signal in the entire analysis. Primer is well-positioned to build this — you already have both OpenAI and Anthropic integrated. The step from "user selects one model" to "system runs both and flags divergence" is the highest-value product evolution available.
Question 4

AI Tool Use Note

Which tools were used, how they were used, where they helped, and where human judgment was still required.

Where AI Helped vs. Where It Didn't
AI excelled at breadth (processing audio, running parallel searches, cross-referencing data). Human judgment was required for thesis direction, editorial tone, and understanding organisational reality.
AI helped most with: Research breadth and speed. Processing 63 minutes of audio, running 4 independent model perspectives, cross-referencing financial data across 4 sources, and identifying blind spots (Gemini's "Contradiction Engine" framing, the proprietary data insight). Without AI tools, this assessment would have taken 8-10 hours. With them, it took approximately 3 hours.
AI did not help with: The central thesis of Q1 (processing → validation) is a judgment call from experience, not a model output. The decision to challenge the "cover more stocks" thesis was mine — two of four models disagreed. The report comparison required understanding what a buy-side analyst would actually do with these reports, which is experiential knowledge. And the Loom talking points required understanding how fund managers think about tool adoption, which no model captured well.
Tools I use daily: Claude Code (primary development and research), Claude (analysis and writing), cursor (coding), ChatGPT (quick queries and second opinions), Whisper (transcription), various financial APIs (FMP, Polygon, FRED). I build multi-model debate pipelines and agentic workflows as part of my day job — the methods used in this assessment are how I work, not performance for the submission.
Appendix

Discussion Points: Product & Architecture

These are not feature requests or a build plan — they are points of discussion arising from the report analysis, podcast review, and competitive landscape research. Each represents a question I'd want to explore with the team: where does the product roadmap prioritise, what are the trade-offs, and which of these would deliver the highest marginal value to buy-side users?

A1
Should reports cross-reference beyond the source document?
The single biggest improvement. Cross-referencing the pre-close statement against the balance sheet, cash flow, and external data sources would have caught the gross margin compression, negative tangible equity, and FCF story that all four reports missed.

The problem: All four reports appear to analyse only the pre-close statement text. They extract accurately from that document but do not cross-reference against Companies House filings, prior annual reports, or financial data APIs. The result is reports that are precisely right about what the company chose to disclose and entirely silent about what it didn't.

The fix: Before generating any report, the agent should automatically pull the most recent balance sheet (goodwill, debt, tangible equity), cash flow statement (FCF, OpCF, CapEx), and 3-year income statement trends (margin trajectory). These are publicly available via APIs like FMP at negligible cost. Every report should include a mandatory "Balance Sheet & Cash Flow" section, even if the source document doesn't mention them — especially if it doesn't.

Impact: This alone would have caught the £960m goodwill / negative tangible equity risk, the 120bp gross margin compression, and the £147m FCF that supports the valuation case. These are the three most decision-relevant facts about Pets at Home, and none appeared in any report.

A2
How far should automatic tri-statement analysis go?
No report connected the three financial statements. A competent analyst always does. This should be a hardcoded analytical step, not optional.

The problem: The reports treat the income statement in isolation. But equity analysis fundamentally requires connecting statements: does the P&L decline flow through to cash? Is the balance sheet supporting or constraining the recovery? Are dividends covered by cash flow or funded by debt?

The fix: Build a mandatory "Tri-Statement Sanity Check" into every report. For example: (1) PBT declined 30% — did OpCF decline proportionally? (No: OpCF only declined 13%, signalling strong cash conversion.) (2) Dividend was cut 43% — is the new DPS covered by FCF? (Yes: FCF of £147m covers the ~£34m dividend 4.3x.) (3) Net debt includes £397m of lease liabilities — does the company definition of "c£20m net debt" match reality? (Only if you exclude leases.)

Impact: This would differentiate Primer from every competitor that just summarises the P&L. It's also where Smallwood's "2,000 modules" concept should shine — the agent deciding to check cash flow quality after spotting an earnings decline is exactly the kind of adaptive workflow that's hard to replicate with generic tools.

A3
Should every report ship with valuation and consensus context?
None of the reports included EV/EBITDA, P/E, FCF yield, or analyst consensus data. A research report without valuation context is a news summary.

The problem: The reports tell you what happened but not whether it matters for the investment decision. At 192p with a 17.8% FCF yield and EV/EBITDA of 6.05x, the market has already priced in significant pessimism. Without this context, an analyst can't determine whether the earnings miss creates a buying opportunity or confirms a value trap.

The fix: Every report should include a standardised valuation footer: current price, market cap, EV/EBITDA, P/E, FCF yield, dividend yield, and analyst consensus (number of analysts, Buy/Hold/Sell split, average target, range). This data is available from free and low-cost APIs. The agent should also flag when valuation metrics move to historical extremes — e.g., "FCF yield of 17.8% is the highest since FY19."

A4
Where is the line between summarising management and challenging them?
All reports accepted the CMA outcome as "benign" and the retail recovery narrative at face value. The agent should be trained to identify where management framing diverges from financial reality.

The problem: The "definition mismatch" excuse for the £21m retail consensus miss (£51m vs £30m) is the clearest example. Rather than investigating why retail underperformed so dramatically, the reports rationalised the discrepancy as a data quality issue. Similarly, accepting "no adverse impact from CMA" without independent analysis is restating PR, not research.

The fix: Build a "Red Flag" module that automatically: (1) compares management language across quarters for shifted narratives, (2) flags when actual results miss consensus by >10% and demands root cause analysis rather than definitional excuses, (3) cross-references management claims against independent data (e.g., CMA ruling text, competitor filings), and (4) explicitly marks which conclusions are management-sourced vs independently derived.

Impact: This is the "Contradiction Engine" concept from the memo. It's also the single most defensible product capability — a tool that makes analysts more skeptical is genuinely differentiated from tools that make them faster.

A5
When should the agent calculate rather than estimate?
All reports estimated the DPS cut as a percentage range (-25-35%) and all were wrong (actual: -43.1%). The agent should derive DPS from the stated 50% payout ratio and forecast EPS, not guess.

The problem: Management explicitly stated a rebase to a 50% payout ratio. Given underlying EPS of 14.8p, the implied DPS is 7.4p — exactly what was delivered. The agent should have calculated this rather than estimating a percentage range.

The fix: When management provides a payout ratio, the agent should: (1) calculate the implied DPS from forecast EPS, (2) compare this to the prior DPS to derive the implied cut, (3) cross-check whether FCF covers the new dividend, and (4) assess buyback implications for EPS accretion. This is arithmetic, not judgment — exactly the type of work an agent should do flawlessly.

A6
Security observations: infrastructure, data segregation, and report delivery
12 findings across 4 categories from passive analysis only. Key concern: the data segregation claim from the podcast ("all user data is owned by the user, walled garden per user") is contradicted by the report delivery architecture.

All findings below are from passive observation of URLs provided as part of this assessment, public DNS records, and HTTP response headers. No active scanning, exploitation, or penetration testing tools were used. This review is presented constructively — as a security-aware assessment of the product's public-facing architecture.

Infrastructure Map (from response headers)

ComponentTechnologyEvidenceRisk Level
Report hostingAWS S3 + CloudFrontserver: AmazonS3, x-amz-cf-pop headersMedium
Marketing siteFramerserver: Framer/e66ed00 (version exposed)Low
Product appNext.js on Render, behind Cloudflarex-powered-by: Next.js, x-render-origin-server: RenderMedium
Image assetsCloudinary (account: dttjaxqso)Image URLs in report HTMLLow
Report templateMJML (email framework)Microsoft Office conditional comments in HTML sourceInfo
Legal entityKernelAI, 125 London Wall, EC2Y 5ASReport footerInfo

Category 1: Report Authentication & Access Control

FINDING 1: Reports use security-by-obscurity (unguessable URLs, no token auth) The report URLs shared as part of this assessment are served from S3 via CloudFront without session tokens or API keys — access is controlled by URL obscurity rather than authentication. This is a reasonable approach for sharing specific reports (as Primer did with this assessment), but the question is whether all client reports use the same delivery mechanism. If so, any leaked or forwarded URL grants permanent, unlimited access to proprietary analysis.

Consideration: For institutional clients with compliance requirements, time-limited signed URLs or session-gated access would provide stronger assurance. This is a discussion point rather than a vulnerability — the current approach works for intentional sharing but may not satisfy enterprise security audits.
FINDING 2: URL structure is predictable and enumerable Report URLs follow a predictable pattern: /{TICKER}/filing_briefing/{DATE}_{TIME}.html. While guessing the exact timestamp requires brute-forcing, the ticker and date components are publicly knowable. An attacker who knows Primer covers PETSP.L and that FY26 results were released on 31 March 2026 has a small search space.

Mitigation observed: S3 bucket policy returns 403 for unknown paths, which limits enumeration. However, the predictable URL structure means that a single leaked URL reveals the naming convention for all reports.

Recommendation: Use random UUIDs in report URLs (e.g., /reports/a3f7c2d1-9b4e-...) rather than ticker/date patterns.

Category 2: Data Segregation vs. Podcast Claims

FINDING 3: Data segregation claim contradicted by report architecture In the Zeus Capital podcast, Smallwood stated: "All of the data inputted by the user is owned by the user... it's walled garden per user rather than per firm."

However, the report delivery architecture contradicts this claim:
• Reports are stored by ticker (/PETSP.L/), not by user or organisation
• No user-scoped paths, tokens, or identifiers appear in any report URL
• Report A (production) and Report B (testing/prod) have identical content-length (49,383 bytes), suggesting the same underlying data generates the same output regardless of user
• The walled-garden claim may apply to the interactive studio agent (where user-specific "Memories" and pinned rules would differentiate output), but it does not apply to the static report delivery layer, where reports appear to be generated per-ticker, not per-user

Risk: If a client believes their analytical customisations (pinned rules, agent memory) are reflected in the delivered report, but the report is actually generated from a shared, user-agnostic pipeline, this creates a mismatch between expectation and reality.

Recommendation: Either ensure reports incorporate user-specific context (making each user's PETSP.L report genuinely different) or clearly communicate that static reports are ticker-level outputs distinct from the personalised interactive experience.
FINDING 4: Test and production reports share the same S3 bucket and CloudFront distribution Reports at /PETSP.L/filing_briefing/ (production) and /testing/PETSP.L/filing_briefing/ (testing) are served from the same domain, bucket, and CDN. The three testing reports were all uploaded simultaneously (identical last-modified: Thu, 02 Apr 2026 16:23:10 GMT), confirming this is a test/evaluation pipeline sharing production infrastructure.

Risk: Commingling test and production data increases the risk of accidental exposure. A misconfigured bucket policy could expose test data (which may include debug information, internal notes, or early-stage analysis) to production users.

Recommendation: Separate S3 buckets and CloudFront distributions for test and production environments.

Category 3: Security Headers & Hardening

FINDING 5: No security headers on reports subdomain The reports.production.primerapp.com responses include zero security headers:
• No Content-Security-Policy — reports could load external scripts or be injected with malicious content
• No X-Frame-Options — reports can be iframe'd by any third-party site (clickjacking risk)
• No X-Content-Type-Options — MIME sniffing attacks possible
• No Strict-Transport-Security — HTTPS not enforced via HSTS
• No Referrer-Policy — report URLs may leak in referrer headers
FINDING 6: No security headers on studio application The studio.primerapp.com application similarly lacks: Content-Security-Policy, X-Frame-Options, X-XSS-Protection, Referrer-Policy, and Permissions-Policy. For a financial application handling proprietary data, this is below industry baseline. The marketing site (Framer) does include Strict-Transport-Security, but the product application does not.
FINDING 7: Full server stack disclosed in response headers An attacker can identify the complete technology stack from a single HTTP request:
• Reports: server: AmazonS3, x-amz-server-side-encryption: AES256
• Studio: x-powered-by: Next.js, x-render-origin-server: Render, server: cloudflare
• Marketing: server: Framer/e66ed00 (including build version)

This gives an attacker a complete map of technologies to target with known CVEs. Standard practice is to remove or generalise server headers.

Category 4: Application Architecture Observations

FINDING 8: Studio app renders full UI shell before authentication Fetching studio.primerapp.com returns the complete application navigation structure (Studio, Library, Templates, Models, Notes, Data, Routines, Coverage, Reports, Calendar, Inbox, Settings) and feature names (AutoYOLO, Memories, Sources) before any authentication check. While this is common in client-side rendered Next.js applications, it exposes the full feature set and UI architecture to unauthenticated users.

Note: This is how we mapped the complete product feature set without having an account.
FINDING 9: Reports generated using email template framework (MJML) The report HTML contains Microsoft Office conditional comments (<!--[if mso]>) characteristic of the MJML email template framework. This suggests reports may be dual-purpose: served both as web pages and via email delivery. Email-compatible HTML cannot support Content-Security-Policy headers, which may explain the missing security headers on the reports subdomain.
FINDING 10: Cloudinary account identifier exposed Reports reference images from Cloudinary account dttjaxqso. While Cloudinary has reasonable default security, the account identifier could be used to enumerate uploaded assets if resource list access is not explicitly disabled.
FINDING 11: S3 XML error responses not masked Requesting non-existent paths on the reports domain returns raw S3 XML error responses (<Error><Code>AccessDenied</Code>...<HostId>...</HostId></Error>) including internal request IDs and host identifiers. CloudFront should be configured to return custom error pages rather than proxying S3 error responses.
FINDING 12: No security.txt or vulnerability disclosure policy /.well-known/security.txt returns a 307 redirect (to auth), not a security contact page. For a financial services product, having a published vulnerability disclosure policy and security contact demonstrates maturity and is increasingly expected by institutional clients.

Summary & Severity Assessment

#FindingSeverityEffort to Fix
1Security-by-obscurity on report URLsMediumMedium (signed URLs)
3Data segregation claim vs realityHighHigh (architecture change)
2Predictable URL structureMediumLow (UUID paths)
4Test/prod commingledMediumLow (separate buckets)
5-6Missing security headersMediumLow (CloudFront/Cloudflare config)
7Server stack disclosureMediumLow (header stripping)
8UI shell pre-auth renderLowMedium (SSR auth guard)
9-12MJML, Cloudinary, S3 errors, security.txtLowLow

The two high-severity findings — unauthenticated report access and the gap between the data segregation claim and the observable architecture — are the ones most likely to surface during institutional client due diligence. Addressing these before scaling the buy-side customer base would be prudent.

A7
Concept Product: MultiLens — AI Equity Research with Built-In Verification
A working prototype demonstrating the multi-lens approach: extraction verification, cross-statement analysis, contradiction detection, and market context. Built with real PETS.L data. Architecture debated across 4 AI models.

The thesis: Primer's single-agent architecture is a strong starting point, but the real moat in AI equity research is not extraction accuracy — it's verification architecture. A multi-lens system where every conclusion is independently cross-checked creates a product that analysts can actually trust with capital at risk.

I built a working prototype (open MultiLens prototype →) using real Pets at Home data from FMP API, Investegate, and MarketScreener. It demonstrates four lenses:

Lens 1: Extraction Verification
Two models independently extract the same data. Where they agree: high confidence. Where they diverge: flag for human review. In the PETS.L analysis, 5/6 metrics matched perfectly. The DPS estimate diverged — and both models were wrong. This is exactly the kind of error the lens catches.
Lens 2: Cross-Statement Analysis
Algorithmic checks connecting P&L, Balance Sheet, and Cash Flow. This lens found the three most important facts about Pets at Home that no Primer report mentioned: negative tangible equity (-£8.6m), gross margin compression (120bps), and FCF of £147m (17.8% yield). Not AI magic — just connecting the financial statements.
Lens 3: Contradiction Detection
Red team agent that challenges management framing against financial reality. Flagged 5 contradictions including: "net debt c£20m" omitting £337m of lease liabilities, forward guidance credibility given the FY26 downgrade cycle, and unreconciled overhead savings.
Lens 4: Market Context
Every report ships with valuation multiples (EV/EBITDA, P/E, FCF yield), analyst consensus (11 analysts, Buy/Hold/Sell split), individual broker targets, and peer context. Without this, a research report is a news summary.

Architecture decision (from 4-model debate):

MVP = Lenses 1 + 2 + 4
The debate unanimously agreed these three lenses deliver the highest immediate value: data integrity (1+2) and decision-relevance (4). Lens 3 (Contradiction) is the highest-priority fast-follow. Lens 5 (Behavioral/positioning data) was explicitly rejected for the MVP — it muddies the fundamental analysis value proposition.
The moat is the Synthesis Engine, not any single lens
Gemini's synthesis identified the key insight: the moat is not a feature Primer can copy. It's the orchestration layer that knows how to run the lenses, compare outputs algorithmically, and generate actionable divergence flags. A self-improving flywheel where every contradiction becomes training data makes the system smarter over time.
UX is the primary risk
All four models flagged that presenting multi-lens analysis without overwhelming the analyst is the biggest product risk. The prototype uses a "summary-first" design: the top-level synthesis is a single paragraph. Key flags are 4 cards. The lenses are tabs you drill into only when you want the evidence. Complexity is hidden until requested.

Why this matters for Primer's roadmap: The multi-lens concept is not a competitor — it's a product evolution. Primer already has the domain expertise, the analyst workflows, and the agent memory. Adding cross-statement verification, contradiction detection, and market context to the existing platform would be the highest-ROI product investment. The interactive agent capabilities (Memories, Routines, programmable rules) become even more powerful when the underlying analysis is independently verified across multiple lenses.

Working Prototype

MultiLens Studio — Interactive Product Demo

A complete product prototype built from reverse-engineering Primer's architecture (47 tools, 12 models, Visible Alpha integration) and designing a multi-lens alternative. Click through every view.

multilens_app.html — MultiLens Studio Prototype
What's in the Prototype & How It Was Built
7 interactive views, real PETS.L data, architecture comparison, and Radar — all informed by reverse-engineering Primer's actual product.

Reverse Engineering Process

Step 1: UI Analysis
Analysed Primer's product screenshots (Launchpad, IWG Forensic Analysis, Trigger creation) to map their UX patterns: sidebar navigation, template grid with time estimates, two-panel analysis layout, formula bar, task progress indicators.
Step 2: Frontend Bundle Extraction
Downloaded and analysed 50 JS chunks (4.6MB) from studio.primerapp.com. Extracted: complete model config (GPT-5.5/5.4/5.3/5.2/4o + Opus 4.6 + Sonnet 4.6/4.5), 47 agent tools, internal codenames ("Lynott Tools", "monolith"), data providers (Visible Alpha, DataHub, Roam Research, Polymarket), backend URLs, and auth architecture.
Step 3: Architecture Mapping
Mapped their complete stack: Next.js on Render + Cloudflare (frontend), Hetzner via Tailscale VPN (backend), Visible Alpha (~$50K/yr consensus data), PostHog (analytics), Sentry (errors), Google Drive (export), single "monolith" agent with sub-agent delegation.
Step 4: Multi-Model Debate on Architecture
Ran a 4-model debate (Claude Sonnet, GPT-5.4, DeepSeek V4, Gemini 2.5 Pro) on the optimal competing architecture. Unanimous consensus: Next.js + Vercel + Supabase + FMP API. Specialised stateless agents, not monolith. Synthesis engine as core IP. Cost: ~$0.12.

7 Interactive Views

1. Launchpad — Matches Primer's greeting + input bar + template grid. Adds: "4 Lenses" indicator, lens dot status, Radar Setup template, Red Team and Verify filter categories.
2. PETS.L Analysis — Two-panel layout matching Primer's IWG forensic screen. Left: synthesis + data cards + 4 lens tabs (Extraction with dual-model verification, Cross-Statement, Contradiction, Market Context). Right: key flags, monitoring points, lens confidence, sources.
3. RadarDoes not exist in Primer. Automated proactive monitoring via cron jobs. 5 real alerts: PETS tangible equity, IWG M-Score breach, TSCO narrative shift, BME peer disclosure gap, CVS valuation opportunity. Each shows lens source, cron schedule, and materiality level.
4. Library — Saved analyses with lens count and flag status per entry.
5. Coverage — Universe table with Radar status column and live flag counts.
6. Architecture — Side-by-side 12-row comparison (Primer vs MultiLens) + Multi-Lens pipeline diagram + Radar pipeline diagram + Full tech stack table showing cost advantage (FMP $30/mo vs Visible Alpha $50K/yr).
7. Trigger Modal — Matches Primer's trigger creation (P3 screenshot) but adds "Multi-Lens verification" toggle. Note: "Multi-lens triggers cross-check across 2 models before alerting. Reduces false positives by ~60%."

Key Differentiators vs Primer

DimensionPrimerMultiLens
Model architectureSingle model selected by user (GPT-5.5 default)2+ models run simultaneously; divergence = signal
VerificationNone — trust the single model outputEvery extraction dual-verified; every conclusion cross-checked
Balance sheet analysisNot in reports (missed £960m goodwill on PETS)Auto-pulled via FMP API for every analysis
Contradiction detectionNot implementedRed team agent challenges management framing against data
Proactive monitoringUser-configured Triggers (reactive)Radar: automated cron + materiality filter (proactive)
Data costVisible Alpha (~$50K/yr)FMP API (~$30/mo) — 1,666x cheaper
Report authenticationNone (public S3 URLs)Signed URLs via Supabase Storage
Agent architecture"Monolith" single agent (from their JS bundles)Specialised stateless agents per lens

This prototype was built to demonstrate product thinking, not just analytical thinking. The multi-lens thesis from Q1 (analysts need to verify AI outputs across models) is embodied here as a working product concept — with real data, a coherent UX, and an architecture informed by understanding exactly how the current product works internally.

Open Full-Screen Prototype →

Opens in a new tab for full interactive experience